Project: Investigation of FBI GUN-CHECK¶

Table of Contents¶

  • Introduction
  • Data Wrangling
  • Exploratory Data Analysis
  • Conclusions

INTRODUCTION

This project is about the analysis of FBI Gun-check, The Data sources are from FBI's National Instant Criminal Background Check System. The U.S. census data is found in a .csv file that contains several variables at the state level, and the NICS data is found in one sheet of an .xlsx file that contains the number of firearm checks by month, state, and type. we are going to explore the datasets to answer and get insight to the following Question;

what is the correlation between state and the gun purchasing order?. what is the overall trending of gun purchase?. what is state has the highest growth in Gun registration?.

In [1]:
# packages installation and loading of dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

Data Wrangling¶

In [2]:
# Load your data and print out a few lines. Perform operations to inspect data
#   types and look for instances of missing or possibly errant data.

df_gun = pd.read_excel(r'C:\Users\k2k\Desktop\udacity alx projects\project 1\gun_data.xlsx')
df_census = pd.read_csv(r'C:\Users\k2k\Desktop\udacity alx projects\project 1\U.S. Census Data.csv')
In [3]:
#let check the top 5 row of df_gun
df_gun.head()
Out[3]:
month state permit permit_recheck handgun long_gun other multiple admin prepawn_handgun ... returned_other rentals_handgun rentals_long_gun private_sale_handgun private_sale_long_gun private_sale_other return_to_seller_handgun return_to_seller_long_gun return_to_seller_other totals
0 2017-09 Alabama 16717.0 0.0 5734.0 6320.0 221.0 317 0.0 15.0 ... 0.0 0.0 0.0 9.0 16.0 3.0 0.0 0.0 3.0 32019
1 2017-09 Alaska 209.0 2.0 2320.0 2930.0 219.0 160 0.0 5.0 ... 0.0 0.0 0.0 17.0 24.0 1.0 0.0 0.0 0.0 6303
2 2017-09 Arizona 5069.0 382.0 11063.0 7946.0 920.0 631 0.0 13.0 ... 0.0 0.0 0.0 38.0 12.0 2.0 0.0 0.0 0.0 28394
3 2017-09 Arkansas 2935.0 632.0 4347.0 6063.0 165.0 366 51.0 12.0 ... 0.0 0.0 0.0 13.0 23.0 0.0 0.0 2.0 1.0 17747
4 2017-09 California 57839.0 0.0 37165.0 24581.0 2984.0 0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 123506

5 rows × 27 columns

In [4]:
#let check top 5 row of df_census

df_census.head()
Out[4]:
Fact Fact Note Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware ... South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
0 Population estimates, July 1, 2016, (V2016) NaN 4,863,300 741,894 6,931,071 2,988,248 39,250,017 5,540,545 3,576,452 952,065 ... 865454 6651194 27,862,596 3,051,217 624,594 8,411,808 7,288,000 1,831,102 5,778,708 585,501
1 Population estimates base, April 1, 2010, (V2... NaN 4,780,131 710,249 6,392,301 2,916,025 37,254,522 5,029,324 3,574,114 897,936 ... 814195 6346298 25,146,100 2,763,888 625,741 8,001,041 6,724,545 1,853,011 5,687,289 563,767
2 Population, percent change - April 1, 2010 (es... NaN 1.70% 4.50% 8.40% 2.50% 5.40% 10.20% 0.10% 6.00% ... 0.063 0.048 10.80% 10.40% -0.20% 5.10% 8.40% -1.20% 1.60% 3.90%
3 Population, Census, April 1, 2010 NaN 4,779,736 710,231 6,392,017 2,915,918 37,253,956 5,029,196 3,574,097 897,934 ... 814180 6346105 25,145,561 2,763,885 625,741 8,001,024 6,724,540 1,852,994 5,686,986 563,626
4 Persons under 5 years, percent, July 1, 2016, ... NaN 6.00% 7.30% 6.30% 6.40% 6.30% 6.10% 5.20% 5.80% ... 0.071 0.061 7.20% 8.30% 4.90% 6.10% 6.20% 5.50% 5.80% 6.50%

5 rows × 52 columns

Data Cleaning;¶

In [5]:
#checking for duplicate in df_gun

df_gun.duplicated().sum()
Out[5]:
0
In [6]:
#checking form missing value
df_gun.dropna().head()
Out[6]:
month state permit permit_recheck handgun long_gun other multiple admin prepawn_handgun ... returned_other rentals_handgun rentals_long_gun private_sale_handgun private_sale_long_gun private_sale_other return_to_seller_handgun return_to_seller_long_gun return_to_seller_other totals
0 2017-09 Alabama 16717.0 0.0 5734.0 6320.0 221.0 317 0.0 15.0 ... 0.0 0.0 0.0 9.0 16.0 3.0 0.0 0.0 3.0 32019
1 2017-09 Alaska 209.0 2.0 2320.0 2930.0 219.0 160 0.0 5.0 ... 0.0 0.0 0.0 17.0 24.0 1.0 0.0 0.0 0.0 6303
2 2017-09 Arizona 5069.0 382.0 11063.0 7946.0 920.0 631 0.0 13.0 ... 0.0 0.0 0.0 38.0 12.0 2.0 0.0 0.0 0.0 28394
3 2017-09 Arkansas 2935.0 632.0 4347.0 6063.0 165.0 366 51.0 12.0 ... 0.0 0.0 0.0 13.0 23.0 0.0 0.0 2.0 1.0 17747
4 2017-09 California 57839.0 0.0 37165.0 24581.0 2984.0 0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 123506

5 rows × 27 columns

In [7]:
#assigning correct data types
df_gun['month']= pd.to_datetime(df_gun['month'])
In [8]:
df_census.head()
Out[8]:
Fact Fact Note Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware ... South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
0 Population estimates, July 1, 2016, (V2016) NaN 4,863,300 741,894 6,931,071 2,988,248 39,250,017 5,540,545 3,576,452 952,065 ... 865454 6651194 27,862,596 3,051,217 624,594 8,411,808 7,288,000 1,831,102 5,778,708 585,501
1 Population estimates base, April 1, 2010, (V2... NaN 4,780,131 710,249 6,392,301 2,916,025 37,254,522 5,029,324 3,574,114 897,936 ... 814195 6346298 25,146,100 2,763,888 625,741 8,001,041 6,724,545 1,853,011 5,687,289 563,767
2 Population, percent change - April 1, 2010 (es... NaN 1.70% 4.50% 8.40% 2.50% 5.40% 10.20% 0.10% 6.00% ... 0.063 0.048 10.80% 10.40% -0.20% 5.10% 8.40% -1.20% 1.60% 3.90%
3 Population, Census, April 1, 2010 NaN 4,779,736 710,231 6,392,017 2,915,918 37,253,956 5,029,196 3,574,097 897,934 ... 814180 6346105 25,145,561 2,763,885 625,741 8,001,024 6,724,540 1,852,994 5,686,986 563,626
4 Persons under 5 years, percent, July 1, 2016, ... NaN 6.00% 7.30% 6.30% 6.40% 6.30% 6.10% 5.20% 5.80% ... 0.071 0.061 7.20% 8.30% 4.90% 6.10% 6.20% 5.50% 5.80% 6.50%

5 rows × 52 columns

In [9]:
df_census = df_census[:64]

df_census
Out[9]:
Fact Fact Note Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware ... South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
0 Population estimates, July 1, 2016, (V2016) NaN 4,863,300 741,894 6,931,071 2,988,248 39,250,017 5,540,545 3,576,452 952,065 ... 865454 6651194 27,862,596 3,051,217 624,594 8,411,808 7,288,000 1,831,102 5,778,708 585,501
1 Population estimates base, April 1, 2010, (V2... NaN 4,780,131 710,249 6,392,301 2,916,025 37,254,522 5,029,324 3,574,114 897,936 ... 814195 6346298 25,146,100 2,763,888 625,741 8,001,041 6,724,545 1,853,011 5,687,289 563,767
2 Population, percent change - April 1, 2010 (es... NaN 1.70% 4.50% 8.40% 2.50% 5.40% 10.20% 0.10% 6.00% ... 0.063 0.048 10.80% 10.40% -0.20% 5.10% 8.40% -1.20% 1.60% 3.90%
3 Population, Census, April 1, 2010 NaN 4,779,736 710,231 6,392,017 2,915,918 37,253,956 5,029,196 3,574,097 897,934 ... 814180 6346105 25,145,561 2,763,885 625,741 8,001,024 6,724,540 1,852,994 5,686,986 563,626
4 Persons under 5 years, percent, July 1, 2016, ... NaN 6.00% 7.30% 6.30% 6.40% 6.30% 6.10% 5.20% 5.80% ... 0.071 0.061 7.20% 8.30% 4.90% 6.10% 6.20% 5.50% 5.80% 6.50%
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
59 Nonminority-owned firms, 2012 NaN 272,651 51,147 344,981 189,029 1,819,107 442,365 259,614 54,782 ... 74228 434025 1,224,845 218,826 70,491 450,109 426,697 104,785 379,934 55,397
60 Veteran-owned firms, 2012 NaN 41,943 7,953 46,780 25,915 252,377 51,722 31,056 7,206 ... 8604 59379 213,590 18,754 8,237 76,434 49,331 12,912 39,830 6,470
61 Nonveteran-owned firms, 2012 NaN 316,984 56,091 427,582 192,988 3,176,341 469,524 281,182 60,318 ... 66219 469392 2,057,218 219,807 63,317 548,439 461,401 94,960 370,755 51,353
62 Population per square mile, 2010 NaN 94.4 1.2 56.3 56 239.1 48.5 738.1 460.8 ... 10.7 153.9 96.3 33.6 67.9 202.6 101.2 77.1 105 5.8
63 Land area in square miles, 2010 NaN 50,645.33 570,640.95 113,594.08 52,035.48 155,779.22 103,641.89 4,842.36 1,948.54 ... 75811 41234.9 261,231.71 82,169.62 9,216.66 39,490.09 66,455.52 24,038.21 54,157.80 97,093.14

64 rows × 52 columns

In [10]:
df_census.rename(columns=df_census.iloc[0]).drop(df_census.index[0])
df_census.head()
Out[10]:
Fact Fact Note Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware ... South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
0 Population estimates, July 1, 2016, (V2016) NaN 4,863,300 741,894 6,931,071 2,988,248 39,250,017 5,540,545 3,576,452 952,065 ... 865454 6651194 27,862,596 3,051,217 624,594 8,411,808 7,288,000 1,831,102 5,778,708 585,501
1 Population estimates base, April 1, 2010, (V2... NaN 4,780,131 710,249 6,392,301 2,916,025 37,254,522 5,029,324 3,574,114 897,936 ... 814195 6346298 25,146,100 2,763,888 625,741 8,001,041 6,724,545 1,853,011 5,687,289 563,767
2 Population, percent change - April 1, 2010 (es... NaN 1.70% 4.50% 8.40% 2.50% 5.40% 10.20% 0.10% 6.00% ... 0.063 0.048 10.80% 10.40% -0.20% 5.10% 8.40% -1.20% 1.60% 3.90%
3 Population, Census, April 1, 2010 NaN 4,779,736 710,231 6,392,017 2,915,918 37,253,956 5,029,196 3,574,097 897,934 ... 814180 6346105 25,145,561 2,763,885 625,741 8,001,024 6,724,540 1,852,994 5,686,986 563,626
4 Persons under 5 years, percent, July 1, 2016, ... NaN 6.00% 7.30% 6.30% 6.40% 6.30% 6.10% 5.20% 5.80% ... 0.071 0.061 7.20% 8.30% 4.90% 6.10% 6.20% 5.50% 5.80% 6.50%

5 rows × 52 columns

In [11]:
#transposing df_census
df_census.set_index('Fact', inplace=True)
df_census = df_census.transpose().reset_index()
In [12]:
df_census.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 65 columns):
 #   Column                                                                                  Non-Null Count  Dtype 
---  ------                                                                                  --------------  ----- 
 0   index                                                                                   51 non-null     object
 1   Population estimates, July 1, 2016,  (V2016)                                            50 non-null     object
 2   Population estimates base, April 1, 2010,  (V2016)                                      50 non-null     object
 3   Population, percent change - April 1, 2010 (estimates base) to July 1, 2016,  (V2016)   50 non-null     object
 4   Population, Census, April 1, 2010                                                       50 non-null     object
 5   Persons under 5 years, percent, July 1, 2016,  (V2016)                                  50 non-null     object
 6   Persons under 5 years, percent, April 1, 2010                                           50 non-null     object
 7   Persons under 18 years, percent, July 1, 2016,  (V2016)                                 50 non-null     object
 8   Persons under 18 years, percent, April 1, 2010                                          50 non-null     object
 9   Persons 65 years and over, percent,  July 1, 2016,  (V2016)                             50 non-null     object
 10  Persons 65 years and over, percent, April 1, 2010                                       50 non-null     object
 11  Female persons, percent,  July 1, 2016,  (V2016)                                        50 non-null     object
 12  Female persons, percent, April 1, 2010                                                  50 non-null     object
 13  White alone, percent, July 1, 2016,  (V2016)                                            51 non-null     object
 14  Black or African American alone, percent, July 1, 2016,  (V2016)                        51 non-null     object
 15  American Indian and Alaska Native alone, percent, July 1, 2016,  (V2016)                51 non-null     object
 16  Asian alone, percent, July 1, 2016,  (V2016)                                            51 non-null     object
 17  Native Hawaiian and Other Pacific Islander alone, percent, July 1, 2016,  (V2016)       51 non-null     object
 18  Two or More Races, percent, July 1, 2016,  (V2016)                                      50 non-null     object
 19  Hispanic or Latino, percent, July 1, 2016,  (V2016)                                     51 non-null     object
 20  White alone, not Hispanic or Latino, percent, July 1, 2016,  (V2016)                    50 non-null     object
 21  Veterans, 2011-2015                                                                     50 non-null     object
 22  Foreign born persons, percent, 2011-2015                                                50 non-null     object
 23  Housing units,  July 1, 2016,  (V2016)                                                  50 non-null     object
 24  Housing units, April 1, 2010                                                            50 non-null     object
 25  Owner-occupied housing unit rate, 2011-2015                                             50 non-null     object
 26  Median value of owner-occupied housing units, 2011-2015                                 50 non-null     object
 27  Median selected monthly owner costs -with a mortgage, 2011-2015                         50 non-null     object
 28  Median selected monthly owner costs -without a mortgage, 2011-2015                      50 non-null     object
 29  Median gross rent, 2011-2015                                                            50 non-null     object
 30  Building permits, 2016                                                                  50 non-null     object
 31  Households, 2011-2015                                                                   50 non-null     object
 32  Persons per household, 2011-2015                                                        50 non-null     object
 33  Living in same house 1 year ago, percent of persons age 1 year+, 2011-2015              50 non-null     object
 34  Language other than English spoken at home, percent of persons age 5 years+, 2011-2015  50 non-null     object
 35  High school graduate or higher, percent of persons age 25 years+, 2011-2015             50 non-null     object
 36  Bachelor's degree or higher, percent of persons age 25 years+, 2011-2015                50 non-null     object
 37  With a disability, under age 65 years, percent, 2011-2015                               50 non-null     object
 38  Persons  without health insurance, under age 65 years, percent                          50 non-null     object
 39  In civilian labor force, total, percent of population age 16 years+, 2011-2015          50 non-null     object
 40  In civilian labor force, female, percent of population age 16 years+, 2011-2015         50 non-null     object
 41  Total accommodation and food services sales, 2012 ($1,000)                              51 non-null     object
 42  Total health care and social assistance receipts/revenue, 2012 ($1,000)                 51 non-null     object
 43  Total manufacturers shipments, 2012 ($1,000)                                            51 non-null     object
 44  Total merchant wholesaler sales, 2012 ($1,000)                                          51 non-null     object
 45  Total retail sales, 2012 ($1,000)                                                       51 non-null     object
 46  Total retail sales per capita, 2012                                                     51 non-null     object
 47  Mean travel time to work (minutes), workers age 16 years+, 2011-2015                    50 non-null     object
 48  Median household income (in 2015 dollars), 2011-2015                                    50 non-null     object
 49  Per capita income in past 12 months (in 2015 dollars), 2011-2015                        50 non-null     object
 50  Persons in poverty, percent                                                             50 non-null     object
 51  Total employer establishments, 2015                                                     51 non-null     object
 52  Total employment, 2015                                                                  51 non-null     object
 53  Total annual payroll, 2015 ($1,000)                                                     51 non-null     object
 54  Total employment, percent change, 2014-2015                                             51 non-null     object
 55  Total nonemployer establishments, 2015                                                  50 non-null     object
 56  All firms, 2012                                                                         50 non-null     object
 57  Men-owned firms, 2012                                                                   50 non-null     object
 58  Women-owned firms, 2012                                                                 50 non-null     object
 59  Minority-owned firms, 2012                                                              50 non-null     object
 60  Nonminority-owned firms, 2012                                                           50 non-null     object
 61  Veteran-owned firms, 2012                                                               50 non-null     object
 62  Nonveteran-owned firms, 2012                                                            50 non-null     object
 63  Population per square mile, 2010                                                        50 non-null     object
 64  Land area in square miles, 2010                                                         50 non-null     object
dtypes: object(65)
memory usage: 26.0+ KB
In [13]:
#dropping the first row (fact_note)
df_census = df_census.drop([0])
In [14]:
#dropping unwanted columns in df_census
df_census  = df_census.drop(df_census.iloc[:,65:], axis = 1)
In [15]:
#checking for duplicate in  df_census

df_census.duplicated().sum()
Out[15]:
0
In [16]:
#checking for missing values in df_census

df_census.isnull().sum()
Out[16]:
Fact
index                                                                                    0
Population estimates, July 1, 2016,  (V2016)                                             0
Population estimates base, April 1, 2010,  (V2016)                                       0
Population, percent change - April 1, 2010 (estimates base) to July 1, 2016,  (V2016)    0
Population, Census, April 1, 2010                                                        0
                                                                                        ..
Nonminority-owned firms, 2012                                                            0
Veteran-owned firms, 2012                                                                0
Nonveteran-owned firms, 2012                                                             0
Population per square mile, 2010                                                         0
Land area in square miles, 2010                                                          0
Length: 65, dtype: int64
In [17]:
#removing %,$ and comma from the data set

col = df_census.columns

df_census[col] = df_census[col].replace({'\$': '', ',': '','\%':''}, regex=True)

Exploratory Data Analysis¶

Research Question 1 (What is the overall gun purchase?)¶

In [18]:
#creating a purchase_trend dataframe.

purchase_trend = pd.pivot_table(data = df_gun, index = 'month', columns= 'state', values= 'totals')

purchase_trend =  purchase_trend.reset_index()

purchase_trend['total'] = purchase_trend.iloc[:,1:].sum(axis = 1)

purchase_trend
Out[18]:
state month Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia ... Texas Utah Vermont Virgin Islands Virginia Washington West Virginia Wisconsin Wyoming total
0 1998-11-01 1062 145 379 589 2101 622 80 55 0 ... 2794 267 59 0 24 361 408 241 107 21176
1 1998-12-01 35506 3840 17074 21163 65344 23176 6790 2080 0 ... 79605 10415 2057 0 25170 11641 13786 15201 3379 870722
2 1999-01-01 18049 2278 12859 11953 56953 19503 6265 1128 1 ... 50992 5055 1043 0 14009 8695 8260 7780 2180 585974
3 1999-02-01 20583 2413 14546 15348 57471 22239 8069 1077 3 ... 55148 5933 1668 0 16053 9383 11206 10578 2643 690215
4 1999-03-01 19424 3206 14992 13720 68327 17287 7877 1314 2 ... 54096 6021 1941 0 17193 10551 10867 14891 2776 741687
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
222 2017-05-01 33331 6442 28734 16749 129307 36008 16149 4012 82 ... 109732 23844 2577 79 36007 45816 15085 37888 4128 1898840
223 2017-06-01 35687 6353 28359 16614 130187 36459 16504 3634 74 ... 118091 22138 2382 153 36838 46023 14498 36267 3975 1888266
224 2017-07-01 31474 6487 25714 14015 114595 33811 14525 2981 68 ... 98338 17925 2275 151 36183 38369 13970 32846 3398 1733308
225 2017-08-01 35038 6946 29376 17517 130901 37150 14019 3509 80 ... 110946 22010 2676 132 39114 42120 14770 40924 4111 1896457
226 2017-09-01 32019 6303 28394 17747 123506 35873 12117 3502 61 ... 128260 20041 3084 9 36446 43049 16723 37506 4281 1857226

227 rows × 57 columns

In [19]:
#plotting total purchase trend line graph

plt.figure(figsize=(40,20))


# Preparing the data to subplots
x = purchase_trend['month']
y = purchase_trend['total']

# Plot the subplots

plt.subplot(2, 2, 1)
plt.plot(x, y, 'g')

# set chart title
plt.title("total purchase trend" , fontsize=20)

#set Axis label
plt.xlabel("Year", fontsize=20)
plt.ylabel("total purchase", fontsize=20)

plt.show()
In [20]:
df_gun
Out[20]:
month state permit permit_recheck handgun long_gun other multiple admin prepawn_handgun ... returned_other rentals_handgun rentals_long_gun private_sale_handgun private_sale_long_gun private_sale_other return_to_seller_handgun return_to_seller_long_gun return_to_seller_other totals
0 2017-09-01 Alabama 16717.0 0.0 5734.0 6320.0 221.0 317 0.0 15.0 ... 0.0 0.0 0.0 9.0 16.0 3.0 0.0 0.0 3.0 32019
1 2017-09-01 Alaska 209.0 2.0 2320.0 2930.0 219.0 160 0.0 5.0 ... 0.0 0.0 0.0 17.0 24.0 1.0 0.0 0.0 0.0 6303
2 2017-09-01 Arizona 5069.0 382.0 11063.0 7946.0 920.0 631 0.0 13.0 ... 0.0 0.0 0.0 38.0 12.0 2.0 0.0 0.0 0.0 28394
3 2017-09-01 Arkansas 2935.0 632.0 4347.0 6063.0 165.0 366 51.0 12.0 ... 0.0 0.0 0.0 13.0 23.0 0.0 0.0 2.0 1.0 17747
4 2017-09-01 California 57839.0 0.0 37165.0 24581.0 2984.0 0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 123506
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12480 1998-11-01 Virginia 0.0 NaN 14.0 2.0 NaN 8 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 24
12481 1998-11-01 Washington 1.0 NaN 65.0 286.0 NaN 8 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 361
12482 1998-11-01 West Virginia 3.0 NaN 149.0 251.0 NaN 5 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 408
12483 1998-11-01 Wisconsin 0.0 NaN 25.0 214.0 NaN 2 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 241
12484 1998-11-01 Wyoming 8.0 NaN 45.0 49.0 NaN 5 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 107

12485 rows × 27 columns

Research Question 2 ; Which states have had the highest growth in gun registrations?¶

In [21]:
#creating a new data set form df_gun 


df_gun_growth = df_gun.loc[:, ['month', 'state', 'totals']]

df_gun_growth.reset_index()
Out[21]:
index month state totals
0 0 2017-09-01 Alabama 32019
1 1 2017-09-01 Alaska 6303
2 2 2017-09-01 Arizona 28394
3 3 2017-09-01 Arkansas 17747
4 4 2017-09-01 California 123506
... ... ... ... ...
12480 12480 1998-11-01 Virginia 24
12481 12481 1998-11-01 Washington 361
12482 12482 1998-11-01 West Virginia 408
12483 12483 1998-11-01 Wisconsin 241
12484 12484 1998-11-01 Wyoming 107

12485 rows × 4 columns

In [22]:
#extracting year from the month column
df_gun_growth['year']= pd.to_datetime(df_gun_growth['month']).dt.strftime('%Y')
df_gun_growth['months']= pd.to_datetime(df_gun_growth['month']).dt.strftime('%m')

#dropping the month (date column)

df_gun_growth = df_gun_growth.drop(columns = 'month')
In [23]:
#extracting data for 1998 and 2017

df_gun_1998 = df_gun_growth.query('year == "1998"')

df_gun_2017 = df_gun_growth.query('year == "2017"')
In [24]:
#grouping by df_gun_1998 state


df_gun_1998 = df_gun_1998.groupby(['state'])['totals'].sum().reset_index()

#rename totals as 2017totals 

df_gun_1998 = df_gun_1998.rename(columns = {'totals': '1998totals'})
In [25]:
#grouping by df_gun_1998 state


df_gun_2017 = df_gun_2017.groupby(['state'])['totals'].sum().reset_index()
#rename totals as 2017totals 
df_gun_2017 = df_gun_2017.rename(columns = {'totals': '2017totals'})
In [26]:
#merging th two dataset

df_growth = df_gun_2017.merge(df_gun_1998, how = 'inner', on = 'state')

df_growth
Out[26]:
state 2017totals 1998totals
0 Alabama 336763 36568
1 Alaska 56902 3985
2 Arizona 279553 17453
3 Arkansas 161694 21752
4 California 1167528 67445
5 Colorado 345559 23798
6 Connecticut 138789 6870
7 Delaware 36826 2135
8 District of Columbia 754 0
9 Florida 912765 32341
10 Georgia 378921 3819
11 Guam 1372 9
12 Hawaii 9217 431
13 Idaho 131447 7339
14 Illinois 1163945 41477
15 Indiana 606881 22518
16 Iowa 110551 14650
17 Kansas 132840 12386
18 Kentucky 3417976 31869
19 Louisiana 224545 27642
20 Maine 68320 4106
21 Mariana Islands 106 0
22 Maryland 112680 9007
23 Massachusetts 154161 1443
24 Michigan 369053 30596
25 Minnesota 529302 12116
26 Mississippi 169056 26097
27 Missouri 362047 24087
28 Montana 88179 6406
29 Nebraska 50519 6655
30 Nevada 93734 6439
31 New Hampshire 95057 2442
32 New Jersey 77113 3795
33 New Mexico 111491 9033
34 New York 276227 14736
35 North Carolina 385554 38943
36 North Dakota 47906 2437
37 Ohio 540654 32481
38 Oklahoma 228878 20075
39 Oregon 240741 14024
40 Pennsylvania 763378 48317
41 Puerto Rico 12431 513
42 Rhode Island 18132 1059
43 South Carolina 295532 9565
44 South Dakota 68914 2751
45 Tennessee 549639 24773
46 Texas 1074971 82399
47 Utah 199279 10682
48 Vermont 27148 2116
49 Virgin Islands 925 0
50 Virginia 366994 25194
51 Washington 419522 12002
52 West Virginia 152784 14194
53 Wisconsin 387537 15442
54 Wyoming 37736 3486
In [27]:
#adding growth rate

df_growth['growth_rate'] = (df_growth['2017totals']- df_growth['1998totals']) / df_growth['1998totals']

df_growth = df_growth.sort_values(['growth_rate'], ascending = False).reset_index()

#removing the umwanted rows
df_growth = df_growth.drop(labels = [0,1,2], axis = 0)
df_growth= df_growth.drop(columns = {'index'}, axis= 0).reset_index()

df_growth
Out[27]:
index state 2017totals 1998totals growth_rate
0 3 Guam 1372 9 151.444444
1 4 Kentucky 3417976 31869 106.250808
2 5 Massachusetts 154161 1443 105.833680
3 6 Georgia 378921 3819 98.219953
4 7 Minnesota 529302 12116 42.686200
5 8 New Hampshire 95057 2442 37.925880
6 9 Washington 419522 12002 33.954341
7 10 South Carolina 295532 9565 29.897229
8 11 Florida 912765 32341 27.223153
9 12 Illinois 1163945 41477 27.062420
10 13 Indiana 606881 22518 25.950928
11 14 Wisconsin 387537 15442 24.096296
12 15 South Dakota 68914 2751 24.050527
13 16 Puerto Rico 12431 513 23.231969
14 17 Tennessee 549639 24773 21.187018
15 18 Hawaii 9217 431 20.385151
16 19 New Jersey 77113 3795 19.319631
17 20 Connecticut 138789 6870 19.202183
18 21 North Dakota 47906 2437 18.657776
19 22 New York 276227 14736 17.745046
20 23 Utah 199279 10682 17.655589
21 24 Idaho 131447 7339 16.910751
22 25 California 1167528 67445 16.310816
23 26 Delaware 36826 2135 16.248712
24 27 Oregon 240741 14024 16.166358
25 28 Rhode Island 18132 1059 16.121813
26 29 Ohio 540654 32481 15.645239
27 30 Maine 68320 4106 15.639065
28 31 Arizona 279553 17453 15.017476
29 32 Pennsylvania 763378 48317 14.799367
30 33 Missouri 362047 24087 14.030805
31 34 Virginia 366994 25194 13.566722
32 35 Nevada 93734 6439 13.557229
33 36 Colorado 345559 23798 13.520506
34 37 Alaska 56902 3985 13.279046
35 38 Montana 88179 6406 12.765064
36 39 Texas 1074971 82399 12.045923
37 40 Vermont 27148 2116 11.829868
38 41 Maryland 112680 9007 11.510270
39 42 New Mexico 111491 9033 11.342633
40 43 Michigan 369053 30596 11.062132
41 44 Oklahoma 228878 20075 10.401146
42 45 Wyoming 37736 3486 9.825014
43 46 West Virginia 152784 14194 9.763985
44 47 Kansas 132840 12386 9.725012
45 48 North Carolina 385554 38943 8.900470
46 49 Alabama 336763 36568 8.209227
47 50 Louisiana 224545 27642 7.123327
48 51 Nebraska 50519 6655 6.591134
49 52 Iowa 110551 14650 6.546143
50 53 Arkansas 161694 21752 6.433523
51 54 Mississippi 169056 26097 5.477986
In [28]:
df_growth.query('state == "Guam"')
Out[28]:
index state 2017totals 1998totals growth_rate
0 3 Guam 1372 9 151.444444
In [29]:
plt.figure(figsize=(20,10))
def growth_rate():
    return (df_growth['2017totals']- df_growth['1998totals']) / df_growth['1998totals']

plt.bar(df_growth['state'].head(10), df_growth['growth_rate'].head(10))
plt.ylabel("% of growth", fontsize=20)
plt.xlabel("State", fontsize=20)
# set chart title
plt.title("Top Ten highest growth in gun registration by state ", fontsize=20)
Out[29]:
Text(0.5, 1.0, 'Top Ten highest growth in gun registration by state ')

As seen in the chart above, Guam has the highest growth in gun registration follow by kentucy and Massacusetts.

Research Question 2 :What census data is most associated with high gun per capital?¶

Answer ;¶

The cenusus data provide data for only 2010 and 2016, so i will extract the data for 2010 and 2016 from df_gun data, then merge with census data using state as the foriegn key to get correlation for the data.

In [30]:
#creating a dfnew_gun
df_new_gun= df_gun.loc[:, ['month', 'state', 'totals']]

df_new_gun
Out[30]:
month state totals
0 2017-09-01 Alabama 32019
1 2017-09-01 Alaska 6303
2 2017-09-01 Arizona 28394
3 2017-09-01 Arkansas 17747
4 2017-09-01 California 123506
... ... ... ...
12480 1998-11-01 Virginia 24
12481 1998-11-01 Washington 361
12482 1998-11-01 West Virginia 408
12483 1998-11-01 Wisconsin 241
12484 1998-11-01 Wyoming 107

12485 rows × 3 columns

In [31]:
#extracting year and month from month column

df_new_gun['year']= pd.to_datetime(df_new_gun['month']).dt.strftime('%Y')
df_new_gun['months']= pd.to_datetime(df_new_gun['month']).dt.strftime('%m')

#dropping the month (date column)

df_new_gun = df_new_gun.drop(columns = 'month')
In [32]:
#extraction 2010-2016 data


df_new_gun = df_new_gun.query('year  == "2010"'and 'year == "2016"')
In [33]:
#grouping df_new_gun by state
df_new_gun = df_new_gun.groupby('state')['totals'].sum().reset_index()
df_new_gun.head()
Out[33]:
state totals
0 Alabama 616947
1 Alaska 87647
2 Arizona 416279
3 Arkansas 266014
4 California 2377167
In [34]:
#extracting census data for 2016 and 2010

#renameing the column as state

df_census_2016_2010 = df_census.iloc[:,0:4]

df_census_2016_2010.rename(columns = {'index':'state','Population estimates, July 1, 2016,  (V2016)' :'population_2016', 'Population estimates base, April 1, 2010,  (V2016)': 'population_2010' , 'Population, percent change - April 1, 2010 (estimates base) to July 1, 2016,  (V2016)': '%population_change2010and2016'}, inplace = True)


df_census_2016_2010.head()
Out[34]:
Fact state population_2016 population_2010 %population_change2010and2016
1 Alabama 4863300 4780131 1.70
2 Alaska 741894 710249 4.50
3 Arizona 6931071 6392301 8.40
4 Arkansas 2988248 2916025 2.50
5 California 39250017 37254522 5.40
In [35]:
#merging the df_new_gun and df_census

df = df_census_2016_2010.merge(df_new_gun, how = 'inner', on = 'state')

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   state                          50 non-null     object
 1   population_2016                50 non-null     object
 2   population_2010                50 non-null     object
 3   %population_change2010and2016  50 non-null     object
 4   totals                         50 non-null     int64 
dtypes: int64(1), object(4)
memory usage: 2.3+ KB
In [36]:
#changing the datatype to float
df['population_2016'] = df['population_2016'].astype(float)
df['population_2010'] = df['population_2010'].astype(float)
df['%population_change2010and2016'] = df['%population_change2010and2016'].astype(float)
In [37]:
#creating correlation table

#df_corr = df.corr()[['totals']][:-1].sort_values(by='totals', ascending = True)
df_corr = df.corr()
df_corr
Out[37]:
population_2016 population_2010 %population_change2010and2016 totals
population_2016 1.000000 0.999079 0.232673 0.605044
population_2010 0.999079 1.000000 0.205972 0.607005
%population_change2010and2016 0.232673 0.205972 1.000000 0.098395
totals 0.605044 0.607005 0.098395 1.000000
In [38]:
#plotting the correlation 

hm = px.imshow(df_corr, height = 1000, text_auto=True, aspect="auto")
hm.update_layout(title = 'Heatmap of census data correlation with firearm registrations', title_x = 1.0)
hm.update_xaxes(side="top")
hm.show()

Conclusions¶

what is the overall trend in gun purchase?: The result shows that there is increase in gun purchase over time from 1998 to 2017,the trend is seasonal which peak at december.

what state has the highest growth in gun purchase?: The result shows that Guam has the highest growth (151.444444%) in Gun registration, the total gun registration increase from 9 in 1998 to 1372 in 2017.

What census data is most associated with high gun per capital?: Both 2010 population and 2016 population census data have positive correlation with the total gun checkedin 2010 and 2016. 2010 population census has 0.605044 which is greater than 0.605044 of 2016 population census.

Limitation¶

The main limitation is the lack of census data that prevented more detailed analysis: the census data is available only for the period 2010 - 2016. Missing data in the NICS Gun-check dataset were removed.

In [ ]:
 
In [ ]: